Skip to content

[AIROCMLIR-80] Utilize amd_arch_db python bindings where applicable#2352

Open
mirza-halilcevic wants to merge 22 commits into
developfrom
amd-arch-db-py
Open

[AIROCMLIR-80] Utilize amd_arch_db python bindings where applicable#2352
mirza-halilcevic wants to merge 22 commits into
developfrom
amd-arch-db-py

Conversation

@mirza-halilcevic
Copy link
Copy Markdown
Contributor

@mirza-halilcevic mirza-halilcevic commented Apr 19, 2026

Motivation

Arch information is usually hardcoded and duplicated across python scripts.

Technical Details

Take advantage of the amd_arch_db pybind11 module.

Test Plan

CI passes.

Test Result

  • Nighty CI
  • Weekly CI

Submission Checklist

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces duplicated/hardcoded GPU architecture capability logic in Python utilities by switching to the amd_arch_db pybind module and extends the underlying arch database with additional exposed properties.

Changes:

  • Updated performance sweep/analysis scripts to query arch capabilities via amd_arch_db (MFMA/WMMA/FP8/FP4, etc.) instead of hardcoded chip lists/prefix checks.
  • Added/extended amd_arch_db Python bindings (enum arithmetic + additional feature flags/fields) and wired the module into build/test dependencies.
  • Updated lit/common Python utilities to derive feature strings and booleans from amd_arch_db.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mlir/utils/performance/tests/test_perfRunner.py Removes unit tests related to chiplet counting logic.
mlir/utils/performance/tests/mock_hip.py Extends CI mocks to include an amd_arch_db stub module.
mlir/utils/performance/perfRunner.py Switches dtype/feature gating and topology queries to amd_arch_db.
mlir/utils/performance/parameterSweeps.py Infers codepaths/flags via amd_arch_db feature bits (adds has_feature).
mlir/utils/performance/attentionSweeps.py Uses amd_arch_db features to pick MFMA vs WMMA for attention sweeps.
mlir/utils/performance/analysis/testing-metrics.py Derives EU-per-CU via amd_arch_db instead of hardcoded value.
mlir/utils/performance/analysis/quickTuningGen.py Uses amd_arch_db features to classify instruction type.
mlir/utils/performance/CMakeLists.txt Adds amd_arch_db as a dependency of the copied performance scripts target.
mlir/unittests/Dialect/Rock/AmdArchDbTests.cpp Extends native-vs-preset comparisons to include new fields.
mlir/test/lit.site.cfg.py.in Adds tools dir to sys.path so amd_arch_db can be imported in lit utils.
mlir/test/fusion/e2e/lit.site.cfg.py.in Uses shared feature detection helper instead of hardcoded arch checks.
mlir/test/e2e/lit.site.cfg.py.in Adds tools dir to sys.path for amd_arch_db import.
mlir/test/common_utils/common.py Replaces hardcoded feature decoding with amd_arch_db lookups.
mlir/test/CMakeLists.txt Adds amd_arch_db to test target dependencies.
mlir/lib/Dialect/Rock/utility/Bindings/CMakeLists.txt Builds amd_arch_db and attempts to make it globally importable via .pth.
mlir/lib/Dialect/Rock/utility/Bindings/AmdArchDbBindings.cpp Enables enum arithmetic and binds additional feature bits/fields.
mlir/lib/Dialect/Rock/IR/AmdArchDb.cpp Extends arch presets with hasFp4 and updates related comments.
mlir/include/mlir/Dialect/Rock/IR/RockAttrDefs.td Documents new “keep in sync” relationship for feature name mapping.
mlir/include/mlir/Dialect/Rock/IR/AmdArchDb.h Adds hasFp4 field to AmdArchInfo and updates constructor.
Comments suppressed due to low confidence (2)

mlir/utils/performance/tests/test_perfRunner.py:92

  • The tests for get_num_chiplets were removed, but the function implementation was also changed substantially in this PR. Since CI now mocks amd_arch_db, consider reintroducing unit tests that validate the new behavior (e.g., by adjusting the mock lookup_arch_info/_DEFAULT_MOCK_INFO.max_num_xcc and asserting get_num_chiplets and get_num_cu return expected values).
    def test_read_nonexistent_returns_none(self):
        db = perfRunner.read_tuning_db("/nonexistent/path.tsv")
        assert db is None


class TestParseDataTypes:
    """Tests for parse_data_types (gemm data types)."""

    def test_empty_returns_defaults(self):
        dtypes, out_map = perfRunner.parse_data_types(None)

mlir/utils/performance/parameterSweeps.py:122

  • infer_codegen_flags_from_arch no longer returns ('unknown', []) (it defaults to 'vanilla'), but the docstring still claims it can return 'unknown' and callers (e.g. main()) still branch on codepath == 'unknown'. Either restore the 'unknown' sentinel when inference fails, or update the docstring + remove/adjust the dead 'unknown' handling so behavior is consistent.
def infer_codegen_flags_from_arch(arch: str,
                                  requested_codepath: str = 'none') -> tuple[str, list[str]]:
    """Infers codepath and optional rocmlir-gen flags from architecture.

    The inferred codepath is used to pick the perf_config family. By default we
    rely on rocmlir-gen arch auto-detection and return no explicit feature
    flags; flags are only emitted when a codepath override is explicitly
    requested.

    Returns ('unknown', []) when inference fails.
    """
    supported_codepath = ['mfma', 'vanilla', 'wmma']
    codepath = requested_codepath

    if codepath not in supported_codepath:
        features = lookup_arch_info(arch).default_features
        if has_feature(features, GemmFeatures.MFMA):
            codepath = 'mfma'
        elif has_feature(features, GemmFeatures.WMMA):
            codepath = 'wmma'
        else:
            codepath = 'vanilla'

    if requested_codepath in supported_codepath:
        return (codepath, get_codegen_flags_for_codepath(arch, codepath))
    return (codepath, [])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlir/utils/performance/CMakeLists.txt Outdated
Comment thread mlir/test/CMakeLists.txt Outdated
Comment thread mlir/lib/Dialect/Rock/utility/Bindings/CMakeLists.txt Outdated
Comment thread mlir/utils/performance/perfRunner.py Outdated
Comment thread mlir/utils/performance/perfRunner.py Outdated
Comment thread mlir/lib/Dialect/Rock/IR/AmdArchDb.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

mlir/utils/performance/perfRunner.py:2316

  • get_num_cu() unconditionally uses lookup_arch_info("native:<id>"). In builds where ROCMLIR_ENABLE_NATIVE_ARCH is disabled, the Python binding raises ValueError for any native:* lookup, so this will crash even if the caller only needs preset arch info.

Please catch that error and fall back to a non-native CU query (e.g. HIP hipGetDeviceProperties(...).multiProcessorCount, or the previous rocminfo-based approach) so perfRunner.py remains usable when native arch support is not compiled in.

def get_num_cu(device_id: int = 0):
    # In native mode, min_num_cu contains the actual number of CUs instead of the minimum
    return lookup_arch_info(f"native:{device_id}").min_num_cu

mlir/utils/performance/analysis/testing-metrics.py:34

  • assign_num_cu() unconditionally calls lookup_arch_info("native:0"). When ROCMLIR_ENABLE_NATIVE_ARCH is OFF, the binding raises ValueError for native lookups, so the script will fail even on systems with ROCm installed.

Please catch that error and fall back to another CU count source (e.g. HIP multiProcessorCount, or preset arch info with a warning about accuracy).

def assign_num_cu():
    if args.c:
        return int(args.c)
    print(
        "Using info from GPU 0 in your system, the data should have be obtained from the same GPU.")
    return lookup_arch_info("native:0").min_num_cu

Comment thread mlir/utils/performance/perfRunner.py
Comment thread mlir/utils/performance/perfRunner.py
Comment thread mlir/test/common_utils/common.py
Comment thread mlir/utils/performance/analysis/testing-metrics.py
Comment thread mlir/utils/performance/analysis/testing-metrics.py Outdated
mirza-halilcevic and others added 4 commits May 14, 2026 15:31
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants